AITopics | imagenet 128 128

We provide more architecture and training details of the proposed HiT as well as additional experimental results to help better understand our paper. MQA is identical except that the different heads share a single set of keys and values. We report detailed results in Table 1 on ImageNet 128 128 . "pixel shuffle" indicates the pixel shuffle operation [ " indicates the blocking operation producing non-overlapping feature blocks, each of which has We use Tensorflow for implementation. We provide the detailed description about the generative process of the proposed HiT in Algorithm 1. See Algorithm 3 for more details about blocking and unblocking. X and Y are blocked feature maps where m is # of patches and n is patch sequence length. Args: X: a tensor used as query with shape [b, m, n, d] Y: a tensor used as key and value with shape [b, m, n, d] W_q: a tensor projecting query with shape [h, d, k] W_k: a tensor projecting key with shape [d, k] W_v: a tensor projecting value with shape [d, v] W_o: a tensor projecting output with shape [h, d, v] Returns: Z: a tensor with shape [b, m, n, d] """ Q = tf.einsum("bmnd,hdk->bhmnk",

block sz, pixel shuffle, proceedings, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

EM Distillation for One-step Diffusion Models

Neural Information Processing SystemsOct-10-2025, 02:15:00 GMT

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process.

arxiv preprint arxiv, diffusion model, noise level, (13 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Improved Transformer for High-Resolution GANs: Supplementary Material Long Zhao

Neural Information Processing SystemsOct-9-2025, 16:02:58 GMT

We provide more architecture and training details of the proposed HiT as well as additional experimental results to help better understand our paper. MQA is identical except that the different heads share a single set of keys and values. We report detailed results in Table 1 on ImageNet 128 128 . "pixel shuffle" indicates the pixel shuffle operation [ " indicates the blocking operation producing non-overlapping feature blocks, each of which has We use Tensorflow for implementation. We provide the detailed description about the generative process of the proposed HiT in Algorithm 1. See Algorithm 3 for more details about blocking and unblocking. X and Y are blocked feature maps where m is # of patches and n is patch sequence length. Args: X: a tensor used as query with shape [b, m, n, d] Y: a tensor used as key and value with shape [b, m, n, d] W_q: a tensor projecting query with shape [h, d, k] W_k: a tensor projecting key with shape [d, k] W_v: a tensor projecting value with shape [d, v] W_o: a tensor projecting output with shape [h, d, v] Returns: Z: a tensor with shape [b, m, n, d] """ Q = tf.einsum("bmnd,hdk->bhmnk",

artificial intelligence, machine learning, pixel shuffle, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

EM Distillation for One-step Diffusion Models

Xie, Sirui, Xiao, Zhisheng, Kingma, Diederik P, Hou, Tingbo, Wu, Ying Nian, Murphy, Kevin Patrick, Salimans, Tim, Poole, Ben, Gao, Ruiqi

arXiv.org Machine LearningMay-27-2024

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.

arxiv preprint arxiv, diffusion model, noise level, (11 more...)

arXiv.org Machine Learning

2405.16852

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Diffusion Models Beat GANs on Image Synthesis

Dhariwal, Prafulla, Nichol, Alex

arXiv.org Artificial IntelligenceMay-13-2021

We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion

arxiv, classifier, diffusion model, (16 more...)

arXiv.org Artificial Intelligence

2105.05233

Country: